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Abstract 



t-H We consider an original problem that arises from the issue of security analysis of a 

power system and that we name optimal discovery with probabilistic expert advice. We 
address it with an algorithm based on the optimistic paradigm and on the Good- Turing 
missing mass estimator. We prove a regret bound on the performance of this algorithm 

l^-j under weak assumptions on the probabilistic experts. Under more restrictive hypotheses, 

we also prove a macroscopic optimality result, comparing the algorithm both with an oracle 
strategy and with uniform sampling. Finally, we provide numerical experiments illustrating 

{NJ these theoretical findings. 

t-H Keywords: optimal discovery probabilistic experts, optimistic algorithm, Good- Turing 

estimator, UCB 

•l-H 
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1 Introduction 

In this paper we consider the following problem: Let X be a set, and A C X be a set of interesting 
elements in X. One can access X only through requests to a finite set of probabilistic experts. 
More precisely, when one makes a request to the i th expert, the latter draws independently at 
random a point from a fixed probability distribution Pi over X . One is interested in discovering 
rapidly as many elements of A as possible, by making sequential requests to the experts. 



1.1 Motivation 

The original motivation for this problem arises from the issue of real-time security analysis of a 
power system. This problem often amounts to identifying in a set of credible contingencies those 
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that may indeed endanger the security of the power system and perhaps lead to a system collapse 
with catastrophic consequences (e.g., an entire region, country may be without electrical power 
for hours). Once those dangerous contingencies have been identified, the system operators 
usually take preventive actions so as to ensure that they could mitigate their effect on the 
system in the likelihood they would occur. Note that usually, the dangerous contingencies are 
very rare with respect to the non dangerous ones. A straightforward approach for tackling 
this security analysis problem is to simulate the power system dynamics for every credible 
contingency so as to identify those that are indeed dangerous. Unfortunately, when the set 
of credible contingencies contains a large number of elements (say, there are more than 10 5 
credible contingencies) such an approach may not possible anymore since the computational 
resources required to simulate every contingency may excess those that are usually available 
during the few (tens of) minutes available for the real-time security analysis. One is therefore left 
with the problem of identifying within this short time-frame a maximum number of dangerous 
contingencies rather than all of them. The approach proposed in |FB121 |FBED + 1~0] addresses 
this problem by building first very rapidly what could be described as a probability distribution 
P over the set of credible contingencies that points with significant probability to contingencies 
which are dangerous. Afterwards, this probability distribution is used to draw the contingencies 
to be analyzed through simulations. When the computational resources are exhausted, the 
approach outputs the contingencies found to be dangerous. One of the main shortcoming 
of this approach is that usually P points only with a significant probability to a few of the 
dangerous contingencies and not all of them. This in turn makes this probability distribution 
not more likely to generate after a few draws new dangerous contingencies than for example a 
uniform one. The dangerous contingencies to which P points to with a significant probability 
depend however strongly on the set of (sometimes arbitrary) engineering choices that have been 
made for building it. One possible strategy to ensure that more dangerous contingencies can 
be identified within a limited budget of draws would therefore be to consider K > 1 sets of 
engineering choices to build K different probability distributions P±, P2, . . ■ , Pr and to draw 
the contingencies from these K distributions rather than only from a single one. This strategy 
raises however an important question to which this paper tries to answer: how should the 
distributions be selected for being able to generate with a given number of draws a maximum 
number of dangerous contingencies? We consider the specific case where the contingencies are 
sequentially drawn and where the distribution selected for generating a contingency at one 
instant can be based on the past distributions that have been selected, the contingencies that 
have been already drawn and the results of the security analyses (dangerous/non dangerous) 
for these contingencies. This corresponds exactly to the optimal discovery problem with expert 
advice described above. We believe that this framework has many other possible applications, 
such as for example web-based content access. 

1.2 Setting and notation 

In this paper we restrict our attention to finite or countably infinite sets X . We denote by 
K the number of experts. For each i 6 {1,...,K}, we assume that (Xi tn ) n >i are random 
variables with distribution Pj such that the (Aj jn )j jra are independent. Sequential discovery 
with probabilistic expert advice can be described as follows: at each time step t £ N*, one picks 
an index I t G {1, . . . , K}, and one observes Xi t;TlI t , where 




s<t 
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The goal is to choose the {It)t>\ so as to observe as many elements of A as possible in a fixed 
horizon t, that is to maximize the number of interesting items found after t requests 

F(t) = A* G {Xi,i, . . -,X hnit , . . .,X KA , . . .,X K , nKt }X. (1) 

Note in particular that it is of no interest to observe twice the same same element of A. The 
index It+i may be chosen according to past observations: it is a (possibly randomized) function 

of (I^X/^i, . . . ,I t ,Xj urlIt t ). 

An easier quantity to analyze than the number of interesting items found F(t) is the waiting 
time T(X), AG (0, 1), which is the time at which the strategy has a missing mass of interesting 
items smaller than A on every experts, that is 

r(A) = inf|t:Vie{l,...,^},P i (A\{Xi A) ...,X 1)nM ,...,X Kjl ,...,X^ nirt }) < a|. (2) 

While we shall derive a general strategy that can be used without any assumption on the 
probabilistic experts, for the mathematical analysis of the waiting time T(A) we make the 
following assumption: 

(i) non-intersecting supports: A n supp(P,) n supp(Pj) = for i ^ j. 

Furthermore we will also derive some results under the following more restrictive assumptions: 

(ii) finite supports with the same cardinality: | supp(Pj)| = N,\/i G {1, . . . , K}, 

(iii) uniform distributions: P{(x) = jr, Vx G supp(Pj), Vi G {1, . . . ,K}. 

1.3 Contribution and content of the paper 

This paper contains the description of a generic algorithm for the optimal discovery problem 
with probabilistic expert advice, and a theoretical analysis of its properties. In Section[2j we first 
depict our strategy, termed Good-UCB. This algorithm relies on the optimistic paradigm (which 
led to the UCB (Upper Confidence Bound) algorithm for multi-armed bandits, [ACBF02], see 
also |GCllj ). and on a finite-time analysis of the Good- Turing estimator for the missing mass. 
We also derive in Section [2] the main result of the paper, a regret bound on the performance 
of Good-UCB under the non-intersecting assumption (i). This bound states roughly that with 
high probability, TjjcbW (the waiting time for the strategy Good-UCB) is smaller than T*(A') 
(the smallest possible waiting time), for some A' close to A and up to a small additional term, 
see Theorem [T] for a more precise statement. In Section [3] we propose to investigate the behavior 
of Good-UCB in a macroscopic limit sense, that is we make assumptions [(i), (ii), (iii)] and we 
consider the limit when the size of the set X grows to infinity while maintaining a constant 
proportion of interesting items. In this scenario we show that Good-UCB is macroscopically 
optimal, in the sense that the normalized waiting time of Good-UCB tends to the normalized 
smallest possible waiting time. We also derive a formula for this latter quantity and we show 
that it is equal to J2i- qi >\ 1°6 a"' where qi is the limiting proportion of interesting items on 
expert i. This macroscopic limit also allows to easily assess the performance of different strate- 
gies, and we show that for example the normalized waiting time of uniform sampling tends to 
K maxi<j<if log % , which proves that this strategy is macroscopically suboptimal, unless all 
experts have the same number of interesting items. 

Finally Section [4] reports experimental results that show that the Good-UCB algorithm 
performs very well, even in a setting where assumptions (i), (ii) and (iii) are not satisfied. We 
also discuss there the relation between the waiting time T defined in ([2]) and the number of 
items found F defined in Q. 
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2 The Good-UCB algorithm 



We describe here the Good-UCB strategy. This algorithm is a sequential method estimating at 
time t, for each expert i G {1, . . . , K}, the total probability of the interesting items that remain 
to be discovered through requests to expert i. This estimation is done by adapting the so-called 
Good- Turing estimator for the missing mass. Then, instead of simply using the distribution 
with highest estimated missing mass, which proves hazardous, we make use of the optimistic 
paradigm (see [Chapter 2, |BCB12j ] and references therein), a heuristic principle well-known in 
reinforcement learning, which entails to prefer using an upper- confidence bound (UCB) of the 
missing mass instead. At a given time step, the Good-UCB algorithm simply makes a request 
to the expert with highest upper-confidence bound on the missing mass at this time step. We 
start with the Good- Turing estimator and a brief study of its concentration properties. Then 
we describe precisely the Good-UCB strategy. We end this section with a sort of 'non-linear 
regret bound' for Good-UCB. 

2.1 Estimating the missing mass 

Our algorithm relies on an estimation at each step of the probability of obtaining a new inter- 
esting item by making a request to a given expert. A similar issue was addressed by I. Good 
and A. Turing as part of their efforts to crack German ciphers for the Enigma machine during 
World War II. In this subsection, we describe a version of the Good- Turing estimator adapted 
to our problem. Let Q be a discrete set, and let A be a subset of interesting elements of fi. 
Assume that X\ , . . . , X n are elements of drawn independently under the same distribution 
P, and define for every x G f2: 

n 

O n (x) = HX m = x}, Z n (x) = t{O n (x) = 0}, U n (x) = t{O n (x) = 1} . 

m=l 

Let R n = YlxeA Zn{x)P{x) denote the missing mass of the interesting items, and let U n = 
^2 xeJ ^U n (x) be the number of elements of A that have been seen exactly once (in linguistics, 
they are often called happaxes). The idea of the Good- Turing estimator ([Goo53j, see also 
[MS00, OSZ03] and references therein) is to estimate the (random) "missing mass" R n , which is 
the total probability of all the interesting items that do not occur in the sample X\, . . . ,X n , by 
the "fraction of happaxes R n = U n /n. This estimator is well-known in linguistics, for instance in 
order to estimate the number of words in some language, sec [GS95J. We shall use the following 
tight bound on the estimation error. We emphasize the fact that the following bound holds true 
independently of the underlying distribution P. 

Proposition 1. With probability at least 1 — 8, 

A. - i - (1 + -J2)J^1 < R n < K + (1 + V2)J^± 
n V n V n 

Proof. First we show that KR n — KR n G [— ^,0] (this result is well known, see for example 
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Theorem 1 in jMSOOj): 



P(x) (1 - P{x)) n - - x nP{x) (1 - P(x)) 



n-l 



n 



- P(x) x nP{x) (1 - P(x)y 
l 

J2P(x)U n (x) 



-E 





1 


G 


— ,0 




n 



Next we apply McDiarmid's inequality ([McD89j) to R n as follows. The random variable R n is 
a function of the independent observations X\, . . . , X ra such that, denoting R n = f(Xi, . . . , X n ), 
modifying just one observation has limited impact: V7 G {!,..., n}, V(xi, . . . , x n , x\) G O n+1 , 



...,z n )- /(xi, . . . ,x/_i,^,x m , . . . ,x n )| < 
Thus one gets that, with probability at least 1 — 6, 



2 

n 



Rn ~ HRn] 



< 



21og(2/<5) 



n 



Finally we extract the following result from Theorem 10 and Theorem 16 in [MO03J: with 
probability at least 1 — 5, 



\Rn-HRn}\ < 



log(2/5) 



n 



□ 



which concludes the proof. 
2.2 The Good-UCB algorithm 

Following the example of the well-known Upper-Confidence Bound procedure for multi-armed 
bandit problems, we propose Algorithm[TJ which we call Good-UCB in reference to the estimator 
it relies on. For each arm i G {1, . . . , K}, the index at time t of Good-UCB corresponds to the 
estimate 

f K «j,t-i 

JWi = — E m 1 = E = = E E = x ^ 

1 xeA 1 s=l j=i s=i 



of the missing mass 



E ^(^ 

xeA\{x Iltnhl ,...,x It _ 1:nit _ it _ 1 } 



inflated by a confidence bonus of order yf\og{t)/n^t-\. Good-UCB relies on a tuning parameter 
C which is discussed below. 

The Good-UCB algorithm is designed to work without any assumption on the probabilistic 
experts. However for the analysis we shall make the non- intersecting supports assumption (i). 
Indeed without this assumption the missing mass of a given expert i depends explicitly on 
the outcomes of all requests (and not only requests to expert i), which makes the analysis 
significantly more difficult. Nonetheless, we show in Section [4] that Good-UCB performs well in 
practice even when assumption (i) is not met. 
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Algorithm 1 Good-UCB 



1: For 1 < t < K choose I t = t. 

2: for t > K + 1 do 

3: Choose I t = arg max x <i< K ^Ri,n i}t -i + C \Jl^f\ 

4: Observe X t distributed as Pj t and update the missing mass estimates accordingly 

5: end for 



2.3 A regret bound for Good-UCB 

In this section we analyze Good-UCB under assumption (i). We shall derive a non-linear regret 
bound as follows. For a fixed A 6 (0, 1) we consider the number of requests TucbW that Good- 
UCB needs to make in order to have a missing mass of interesting items smaller than A on each 
expert, see ([2]). We also consider the omniscient oracle strategy that minimizes this number of 
requests, given the knowledge of A and the sequence of answers to the requests (Xi tS )i<i<K, s >i- 
We denote by T*(A) the corresponding number of requests for this omniscient oracle strategy. 
We now prove that with high probability, Tucb(^) is smaller than T*(A'), for some A' close 
to A and up to a small additional term. This can be viewed as a finite-time regret bound for 
Good-UCB. 



Theorem 1. For any A € (0,1), c > and S > 1, under assumption (i), Good-UCB (with 

K 

cS c ' 



constant C = (1 + \^2)\/c + 2) satisfies with probability at least 1 h 



T UC b{\) < T*+KS\og (8T* + 16KSlog(KS)) , where T* = T* |a - | - 2(1 + ^2)^ ^ j . 

Informally this bound shows that Good-UCB slightly lags behind the omniscient oracle 
strategy. Under more restrictive assumptions on the experts it is possible to obtain a more 
explicit bound by studying the variations of T. In the next section we take another route and 
we show that the above upper bound can be used to prove a clear qualitative property for 
Good-UCB, namely its macroscopic optimality. 

Proof. Recall that we work under assumption (i), and we run Good-UCB with parameter C = 
(1 + V2)^/c~+2, for some positive constant c. In particular thanks to assumption (i) we can 
define the missing mass and the missing mass estimate of expert i after t pulls as: 

Ri,t = J2 and ht = - t Y. 1 1 1 = = x n ■ 

xeA\{X t!l ,..,X M } xeA [ s=l J 

In this proof we consider the following event: 

£ = hi g {l,...,K},Vt > S,Vs < t, 



Using Proposition [l] and an union bound, one obtains P(£) > 1 — In the following we work 
on the event £. Recall that T*(A) (respectively Tjjcb(^)) is the time at which the omniscient 
oracle strategy (respectively the Good-UCB strategy) attains a missing mass smaller than A on 
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all experts. Note that T*(A) and TjjcbW ar e functions of (Xi tS )i<i<K, s >i- In particular one 
can write: 

Tucb(X) = min {t > 1 : Vi € {1, . . . , t < A} , 

K 

T*(A) = ^Z*(A), where T*(X) = mm{t > 1 : # M < A} . 

i=i 

Let 



[/(A) = min J t > 1 : Vz € {1, . . . , K}, R t ^ t + (1 + yfe) J ( - c + 2 ^°^ < A 



Let S" > S to be defined later. On the event £ one clearly gets TjjcbW < max(S", f/(A)). More- 
over the following sequence of inequalities hold true if U{\) > S' (see below for an explanation 
of each inequality) 



Ui,n itU(x) > Xi,n itU(x) ~ ~ (1 + V2JW 

n i,U(X) V n i,C/(A) 

- -^«,«i,!7(A)-l " 



+ ^ (c + 2)lo g (4E/(A)) 



n i,C/(A) V n i,f/(A) 



> [ a - (i + - - J- - (i + V2). < c+ 2) log(4Cf < A)) 

V V n ^(A) - 1 / »\t/(A) V n i,C/(A) 



> A — 



n 



i,U{\) 



2(l + v/2)> + 2 " OE < 4 ^» 

V n i,l/(A)-l 



The first inequality comes from the fact that we are on event £ and we assume U(X) > S' . The 
second inequality uses the fact that when we make a request to an expert, the number of items 
uniquely seen on this expert can drop by at most one, and thus we get 

sRi, s > (s ~ l)-Ri,s-i - 1 > sRi jS -i - 2. 

The third inequality is the key step of the proof. Consider the time step t such that n^t = 
n i,U(\) ~ 1 an d = ?\{/(A) • Since t < U (A) we know that one of the expert satisfies 

R j,n j}t + (1 + v / 2)Y / (C+2 n'°^ > A - Moreover, since Good-UCB is run with constant C = 
(1 + ^/2)Vc+~2 and since we make a request to expert i at time t, we know that it maximizes 
the Good-UCB index, and thus i^„ M + (1 + y/2) yj (c+2 ^° g ^ > A. Using that t < U(X) 
completes the proof of the third inequality. The fourth inequality is trivial. 
We just proved that if n it u^ > S' then 



In particular this directly shows that 



n 
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which implies 



U(A) < K# + r('A-| ; -2( 1 + V2)./ (c+2)k>g(4C, ( A) ^ 



< i^log(4[/(A)) + T^A-|-2(l + v^^j, 

where the last inequality follows by taking S" = 51og(4C/(A)). Finally using Lemma [T] (in the 
appendix) and Tucb(X) < max(5", £/(A)) ends the proof. □ 



3 Macroscopic limit 

In the previous section we derived a very general non- linear regret bound for Good-UCB. Here 
we shall study the behavior of Good-UCB under more restrictive assumptions on the experts, 
but it will allow us to derive a clear qualitative statement about its performance, and it also 
permits easier comparison with other strategies such as uniform sampling. In this section we 
shall add the two following assumptions in addition to assumption (i): 

(ii) finite supports with the same cardinality: | supp(Pj)| = N,\/i G {1, . . . , K}, 
(hi) uniform distributions: P{(x) = ^,Vx G supp(Pj),Vi G {1, . . . ,K}. 

These assumptions are primarily made in order to be able to assess the performance of the 
optimal strategy. In this setting it is convenient to re-parameterize slightly the problem (in 
particular we make explicit the dependency on N for reasons that will appear later). Let 
X N = {1, . . . , K] x {1, ... , N}, A N C X N the set of interesting items of X N , and Q N = \A N \ 
the number of interesting items. We assume that, for expert i G {1, . . . , K}, is the uniform 
distribution on {i} x {1,...,N}. We also denote by Qf = \A N n ({i} x {1,...,N})\ the 
number of interesting items accessible through requests to expert i. Without loss of generality, 
we assume in this section that > Q2 > ■ ■ ■ > Q 1 ^- 

The macroscopic limit that we investigate in this section corresponds to the setting where 
N goes to infinity together with the Qf in such a way that Qf /N — > qi G (0,1). For a 
given strategy we are interested in the time T N (X) such that all experts have at most iVA 
undiscovered interesting items. In particular we define Tff CB (X) (respectively (X)) to be the 
corresponding time for the Good-UCB strategy (respectively the oracle omniscient strategy). 
In the macroscopic limit we shall be particularly interested in normalized limit waiting time 
lim^ +00 T Ar (A)/iV. 



3.1 Macroscopic behavior of the oracle closed-loop strategy 

In this section we shall derive an explicit upper bound on the macroscopic limit of by 
studying another oracle strategy, the oracle closed-loop (OCL) strategy. At each time step, 
OCL makes a request to one of the experts with highest number of still undiscovered interesting 
items: the expert requested at time t is: 

It G arg maxPj (A \ {Xi A , . . . , X lm t , . . . , X K ,i, ■ ■ ■ , ^K,n K J) • 

l<i<K 

Theorem 2. For every X G (0, qx), for every sequence (X n )n converging to X as N goes to 
infinity, under assumption (i), (ii) and (Hi), almost surely 

lim T ocl^ n ) = y 1( « 

N^oo N ^ X 

i:qi>\ 
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Proof. Denote by Bf the set of interesting items in {1,...,N} supported by Pa = 
{x G {1,...,N} : (i,x) G A N }. Successive draws of expert i are denoted (i, X^), (i, X^), . . . , 
where the variables (X^ n )i :n are assumed to be independent. We denote by {Df k ) l<k< qN the 
increasing sequence of the indices corresponding to draws for which new interesting items are 
discovered with expert i: 



D il = mm 



{n > 1 : € flf} 



D 



N 
i,2 



min < n > D 



N 



X; „ G B, 



N 



We also define fig, = and for fc > 1, = - The random variables Sg k (1 < i < 

K, k > 1) are independent with geometric distribution Q((l + — k)/N). 

At every step, the OCL should call the expert with maximal number of undiscovered inter- 
esting items. Hence, it can: 

• first request expert 1 for D 1 ^ ' N N steps; 

• then, alternatively request 

— expert 1 for Si 1+ qn_qn steps; 

— expert 2 for S^i steps; 

— expert 1 for Si 2+ qn_qn steps; 

— expert 2 for S 22 steps; 

— and so on, until there are only undiscovered interesting items on experts 1 and 
2. 

• and so on, including successively experts 3, 4, . . . , K in the alternation. 
Obviously, 



pN 
[ OCL 



(A 



i:Qf>AfA JV 



-NX N 



It suffices now to show that for every expert i G {1, . . . ,K}, D^ N _ NxN /N converges almost 
surely to log(%/A) as N goes to infinity. Write 

1 



N 



,NX N 



N 



NX N 



E 



U i,Q?-NX N 



1 

N 



-NX"-1 



E M- E M]) 



(3) 



k=l 



For every positive integer d and for k G {1,...,A — 1}, elementary manipulations of the 
geometric distribution yield that 



E 



N~\\ d 



< E 



^i.XN 



E 



Si 



N 



X N 



c(d) 2c(d) 
" (X N ) d ~ A 4 



for some positive constant c(d) depending only on d, and for N large enough. Hence, taking ^ 
to the fourth power and developing yields 



E 



ir 



JV 

,NX N 



< 



N 2 A 4 
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for some positive constant c' . Using Markov's inequality together with the Borel-Cantelli lemma, 



this permits to show that W N xN converges almost surely to as N goes to infinity. But 



N 



Qf iVA^ + 1 S iVA^ 



with < < 1/(NX N ) according to Lemma [5J and thus 



N 



D 



N 



2i 



IN. 



which concludes the proof. □ 

3.2 Macroscopic behavior of uniform sampling 

In this section we study the simple uniform sampling strategy that cycles through the experts, 
i.e., at time t uniform sampling makes a request to the (t mod [K]) th expert. This strategy 
is not macroscopically optimal unless all experts have the same number of interesting items. 
Furthermore the next proposition makes precise the extent of improvement of a macroscopic 
optimal strategy over uniform sampling. The proof follows the exact same steps than the proof 
of Theorem [2] and thus is omitted. 

Proposition 2. For every A € (0, qi), for every sequence (\ n )n converging to X as N goes to 
infinity, under assumption (i), (it) and (in), almost surely 

lim T&i^l =Klog ^. 

N^oo N A 

3.3 Macroscopic optimality of Good-UCB 

Using the regret bound of Theorem[l]we obtain the following corollary that shows the asymptotic 
optimality of the Good-UCB algorithm in the macroscopic sense. 

Corollary 1. Take C = (1 + V / 2)v / c + 2 with c > 3/2 in Algorithm^ Under assumption (i), 
(ii) and (Hi), for every sequence (\ n )n converging to A as N goes to infinity, almost surely 

limsup r ^ r (AJV) < Y log^. 

i:qi>\ 

Proof. Let S N = N 2 / 3 . First note that: 



i N*f x N_ J__ 2 (l + v^)y / ^^A wheniV^oo. 

Thus, by Theorem [2j and the fact that the OCL strategy needs at least as much time as the 
omniscient oracle strategy in order to find the same number of items, there exists an event f2 
of probability 1 on which 

T?(£ N ) ^ qi 
lim sup — < > log — . 

N^+£ N ~ A 

i:qi>\ 
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Thus, according to Theorem[TJ for each positive integer N there exists an event An of probability 
P{A N ) > 1 - K/(cN 2c / 3 ) on which 

T N (\N\ T N (pN\ K cN 

1ucb[X } < ±£1 + ^_ log ( 8 7f (^) + 16i^log(i^)) 



N - N N 

_T* N (t N ) +0 flog(N) 



N V ^ 1/3 

Using Borel-Cantelli's lemma and the fact that, with our choice of parameters, A r_2c//3 < oo, 
we obtain that except maybe on the set (of probability 0) 0, U limsup ^4 at, 

-■AT (\N\ rpN I nN\ 



limsup < limsup * ^ ; < N log f , 

i:qi>\ 

which ends the proof. □ 



4 Simulations 

We provide a few simulations illustrating the behavior of the Good-UCB algorithm and the 
asymptotic analysis above of Section [3j We first consider an example with K = 7 different 
sampling distributions satisfying assumptions [(i),(ii),(iii)], with respective proportions of in- 
teresting items qi = 51.2%, q 2 = 25.6%, q 3 = 12.8%, q 4 = 6.4%, q 5 = 3.2%, q 6 = 1.6% and 
q 7 = 0.8%. 

We have chosen to display here the numbers of items found as a function of the number 
of draws (see Q), instead of the times T N (X N ), because they express more intuitively the 
discovering possibilities of each algorithm. Note, however, that the correspondence between 
these two quantities is straightforward, especially in the macroscopic limit: For A E (0, q\) let 

T(A) = £ log | . (4) 

i:qi>\ 

It is easy to show that the proportion of interesting items found by the OCL strategy after Nt 
draws converge to 

K 

F{t) = Y J (^-T- 1 {t)) + . (5) 

i=l 

Furthermore the latter expression is a lower bound for the corresponding proportion of in- 
teresting items found by the Good-UCB algorithm. Proposition [3l proved in the Appendix, 
provides a more explicit expression for F: denoting q = Yli=l 9i> there exists an increasing, 
{1, . . . , K }-valued function / such that, for each t, 

F(t) = q-I(t)q m exp (-*//(*)) 

where g^. denotes the geometric mean of qi, . . . , Qi(t)- This permits an explicit comparison 
of the macroscopic performance of the Good-UCB algorithm with uniform sampling: when all 
distributions are sampled equally often, the proportion of unseen interesting items at time t is 
smaller than 

K 

y^exp(-t/iT) = Kq K exp(-t/K) , 
i=l 
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Figure 1: Number of items found by Good-UCB (solid), the OCL (dashed), and uniform sam- 
pling (dotted) as a function of time for sizes N = 128, N = 500, N = 1000 and N = 10000 in a 
7-experts setting. 






10000 12000 




where qx = ^2i=xQ.i)/K is the arithmetic mean of the {qi)%. On the other hand, for the 
Good-UCB algorithm, the proportion of unseen interesting items at time t is smaller than 

I(t)q m exp(-t/I(t)) . 

The ratio of those two quantities is a decreasing function of time lower-bounded by (1k/q k > 1, 
the ratio of the arithmetic mean with the geometric mean of the (qi)i- As expected, this ratio 
gets larger when the proportions of interesting items among experts becomes more unbalanced. 

Figure [T] displays the number of items found as a function of time by the Good-UCB (solid) , 
the OCL (dashed) and the uniform sampling scheme that alternates between experts (dotted). 
The results are presented for sizes N = 128, N = 500, N = 1000 and N = 10000, each time 
for one representative run (averaging over different runs removes the interesting variability of 
the process). We chose to plot the number of items found rather than the waiting time t as 
the former is easier to visualize while the latter was easier to analyze. In fact, macroscopic 
optimality in terms of number of items found could also be derived with the techniques of 
Section [3j Figure [T] also shows clearly the macroscopic convergence of Good-UCB to the OCL. 
Moreover, it can be seen that, even for very moderate values of N, the Good-UCB significantly 
outperforms uniform sampling even if it is clearly distanced by the OCL. 

For these simulations, the parameter C of Algorithm Good-UCB has been taken equal to 
1/2, which is a rather conservative choice. In fact, it appears that during all rounds of all runs, 
all upper-confidence bounds did contain the actual missing mass. Of course, a bolder choice of 
C can only improve the performance of the algorithm, as long as the confidence level remains 
sufficient. 
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Figure 2: Number of prime numbers found by Good-UCB (solid), the OCL (dashed), 
and uniform sampling (dotted) as a function of time, using geometric experts with means 
100, 300, 500, 700 and 900, for C = 0.1 (left) and C = 0.02 (right). 

1 000 1 1 1 1 1 1000 . . . . 
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In order to illustrate the efficiency of the Good-UCB algorithm in a more difficult setting, 
which does not satisfy any of the assumptions (i), (ii) and (iii), we also considered the following 
(artificial) example: K = 5 probabilistic experts draw independent sequences of geometrically 
distributed random variables, with expectations 100, 300, 500, 700 and 900 respectively. The 
set of interesting items is the set of prime numbers. We compare the oracle closed-loop policy, 
Good-UCB and uniform sampling. The results are displayed in Figure [2] Even if the difference 
remains significant between Good-UCB and the OCL, the former still performs significantly 
better than uniform sampling during the entire discovery process. In this example, choosing 
a smaller parameter C seems to be preferable; this is due to the fact that the proportion of 
interesting items on each arm is low; in that case, it may be possible to show, by using tighter 
concentration inequalities, that the concentration of the Good- Turing estimator is actually 
better than suggested by Proposition [T] In fact, this experiment suggests that the value of C 
should be chosen smaller when the remaining missing mass is small. 
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Appendix 

Lemma 1. Let a,b, x > such that x < a + 6 log a; . Then one has 

x < a + b log (2a + 46 log(46)) . 

Proof. If a > blogx then x < 2a and thus x < a + 61og(2a). On the other hand if a < blogx 
then x < 2b log x which easily implies x < 46 log (46) and thus x < a + 6 log (46 log (46)). In any 
case one has x < a + 6 log (2a + 461og(46)). □ 

Lemma 2. For all 1 < k < n, 

1 , n sr-^ 1 , n 

j=k+i 

Proof. The standard sum/integral comparison yields 

n + 1 n 1 n 

log ¥TT- £ ]- logf k 
j=k+i J 

but 



n + 1 n ( • \ I 1\ n 1 



□ 



The open-loop oracle policy 

In this final section, we provide an macroscopic analysis of the open-loop oracle policy in the 
case of uniform sampling, that is under Hypotheses (i), (ii) and (iii). An open- loop policy must 
choose, for each horizon t, the respective numbers of requests (n^, . . . , n^) for each distribution 
(so that n± + • • • + = t N ) in advance. It appears here that, in the limit, the oracle open-loop 
(OOL) policy, which makes use of the parameters (Qi , ■ ■ ■ , Qk): i s as good as the OCL policy. 
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Let here R_f n N = (Qf — Ff \nf))/N be the proportion of interesting items not yet found 

with expert i after nf requests. Suppose that t N /N — > t, and that nf /N — > vi as N goes to 
infinity; it is easily shown that, almost surely, 



lim R N N = lim E 

7V-s>oo N->OQ 



R 



N 



1\< 



qn h _ i_y 
li m _J _^ — = %e xp(-fj) . 

iV-»oo TV 



Hence, the proportion of interesting items found with the allocation (nf , . . . , n^) almost surely 



converges to J2i=i Qi (1 ~~ ex P( — v i))- Defining 



K 



i=i 

it follows that finding the best macroscopic allocation reduces to the following constrained 
convex minimization problem: 

min r[y) such that v\ + ■ ■ • + vjc = t and Vi, i/j > . 

The solution r*(t), reached at v = v*{t), is easily derived by classical optimization techniques: 

Proposition 3. For every i G {1, . . . , K}, let q. = exp (l/i x Ylk=i l°S9fc) denotes the geo- 
metric mean of qi, . . . ,qi. 

1. There exists I(t) G {1, . . . , K} such that 

Vi </(*), ^(t) = 7 l y +log^ 
Vi >/(*), i/?(t) = 0. 



Hence, 



^w = ^)%) ex p(-7^)+ E 



^1 • 

z>J(t) 

,2. There exists 1 = t± < ■ ■ ■ < tx < +oo smc/i ifoaf 

Vt G [ti,ti+i[, /(*)=». 

T/ie (tfc)fe ore swc/i i/iai 



% + (i - exp (^-t^-^J = iq. exp ^~ 



i 



For instance, t\ = \og(q\/q2). 
Proof: Introduce the Lagrangian: 

K / K \ K 

L(ui, . . .,u K ,X,fn, . . .,fi K ) = ^^exp (--^) + A I J - ^fJ-i^i 

i=i ' ' \i=i J i=i 

We need to find the solution of: 

Vi G {1,...,M}, - ft exp(-i/i) + A-/ij = 
if 

2>i = < 

i=l 

Vi G {1, . . . , M}, = and /ij > 
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We first obtain that 

Pi = logqi - log (A - m) 
Denoting A = {i : ui > 0}, and using that i £ A =>• Hi = we get: 



t = J^log( ft )-|A|log(A) 



from which we get 



and then for all i £ A: 



log(A) 



Vi = log Qi + 



t 


1 






t 


1 


\A\ 


~ L4T 



Next, observe that = <s=> % > A: in fact, if i>i = then the first equation gives 
— qi + A — ^ = 0, and < Hi = A — Conversely, if > then = and = log(gj/A) > 
implies % > A. Thus, there exists such that A = {1, . . . , /(£)}, and for all i < I(t), 



V; = lot 



J(t) 



Moreover, 



r*(t) =r(u 1 ,...,u I (t),0,...,0) 
= E * exp 



i<I(t) 



log + — 



+ E * 

»>/(*) 



J(% J(t) exp (-y^y) + E * 



i>J(t) 



The instants (t«)i<j<_ft- are such that 

(* ~ ex P 
which is equivalent to 



i - 1 



E % = ^ ex p(-7) +E*' 



fe>i— 1 



fc>i 



+ (»- IJfj^ejq) 



i - 1 



»g, exp 



U 



For i = 2, this gives 



= 92 + 5l exp(-z^) - exp 



;/ 2 



V^i exp(-z^ 



which leads to t\ = log (91/^2) 



Theorem 3. In the macroscopic limit, the proportion of items found by the open-loop oracle 
policy uniformly converges to the function F defined in Equation §5§ . 
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The proportion of interesting items found by the OOL policy is 

K 

= £>i-A(t)) + , 

i=i 

where A(t) = q T ^ exp ^— jpyj £ [0, <?/(£)]■ To conclude, it remains only to remark that A = T~ 
where T is defined in Equation Q. In fact, if A is such that qi +i < A < qi , then I(T(X)) = 
and 

A (r(A)) = g exp : — = exp — > log qi exp — : = A . 

V w yo^ o J V ) 

If A < qx, the same holds with Iq = K. 



r*(t) 



% - g m exp 



J(i) 



17 



